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Abstract 

The Behrens-Fisher problem concerns testing the equality of the means of 
two normal populations with possibly different variances. The null hypothesis 
in this problem induces a statistical model for which the likelihood function may 
have more than one local maximum. We show that such multimodality con- 
tradicts the null hypothesis in the sense that if this hypothesis is true then the 
probability of multimodality converges to zero when both sample sizes tend to 
infinity. Additional results include a finite-sample bound on the probability of 
multimodality under the null and asymptotics for the probability of multimodal- 
ity under the alternative. 

Keywords: Algebraic statistics; Discriminant; Heteroscedasticity; Maximum likelihood 
estimation; Two-sample t-test. 

1 Introduction 

The Behrens-Fisher problem is concerned with testing 

Hq : fix = IJ'Y vs. Hi: fix ^ fJ-Y, 

where fix and fiy are the means of two normal populations with possibly different variances 
cr^ and ay- An interesting aspect of the problem is that the likelihood equations for the 
model induced by Hq may have more than one solution. In fact, with probability one, there 
will be either one or three solutions with the two cases corres ponding to one or two loca l 



maxima of the likelihood function. According to simulations of ISugiura and Guptal (|l987l ) 



three solutions to the likelihood equations occur infrequently if the observations are drawn 
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from a distribution in Hq. In this note we provide an explanation for this rare occurrence of 
multiple solutions by proving that under Hq the probability of this event converges to zero 
when n and m tend to infinity (Corollary [3l). This and more general large-sample results 
about the probability of multiple solutions (Proposition [2] and Theorem [6]) are based on two 
observations. First, solving the likelihood equations amounts to solving one cubic polynomial 
equation. Second, the number of real roots of a cubic can be determined using the cubic 
discriminant. The discriminant criterion also allows us to derive a finite-sample bound on 
the null probability of multiple solutions to the likelihood equations (Proposition [1]). 

While arguments can be given for using a likeli hood ratio-bas ed test instead of Welch's 
approximate t-test in the Behrens-Fisher problem (jjensenl . Il992l ). the latter test is widely 
used in practice and avoids maximization of the likelihood function under the null hy- 
pothesis. In that sense the practical implications of our study are perhaps not immediate. 
However, in more general models involving heteroscedastic structures statistical practice 
often relies on likelihood ratio tests that do require solving the maximization problem. Our 
results provide geometric intuition about this problem in the simple univariate Behrens- 
Fisher model (Figured]), for which it holds, rather reassuringly, that the likelihood function 
for the null model is asymptotically unimodal if the model is correctly specified. It would 
be interesting to obtain generalizations of this fact for other, more complicated models. 



2 Solving the likelihood equations 

We start out by deriving a convenient form of the likelihood equations for the three- 
parameter model induced by the null hypothesis in the Behrens-Fisher problem. 



2.1 A cubic equation 



Let X 



) Xn 



N{fix,o'x) and Yi, 



Y 



N{hy,(Ty) be two independent normal 



samples. Under the null hypothesis Hq^ fix is equal to //y and we denote this common 
mean by /i. The log-likelihood function for the null model can be written as 



£(Ai,cr|,CJ^) 



n -\-m 



log(2^) 



^log(ai) - ^log((j^) 



n 
2 



2n 



X 



m 
~2 



Y 



Here, X and Y are the two sample means, and 

-x='-tiX.-Xf 



i=l 



is the empirical variance for the first sample; the second empirical variance cjy is defined 
analogously. If min(n, m) > 2, then both o"^ and ay are positive with probability one. This 
sample size condition will be assumed throughout. 
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The partial derivatives of the log-hkehhood function are 



n{X — ^) m(Y — /i) 



(7, 



a- 



Y 



and 



2al 



24 



the partial derivative for cry is analogous. Let r„ = n/m. Then the likelihood equations 
obtained by setting the three partial derivatives to zero are equivalent to the polynomial 
equations 



rn{X - ij)al + {Y - ^l)a\ = Q, 
al = {Y-iJif + al. 



(2.1) 
(2.2) 
(2.3) 



Here, equivalence means that the two solution sets are almost surely equal. Plugging the 
expressions for a\ and ay from (j2.2p and (j2.3p into ()2.ip yields the cubic equation 



/(//) = aa/U^ + a2/i2 + ai/x + oq = 



(2.4) 



with 



03 = 1 + 

a2 = -(2X + y)-r„(2y + X), 

ai = + 2(1 + rn)XY + r^y^ + + r^frf^, and 
ao = -X^y - TnY^X - a\Y - rnd^X. 

Hence, the maximum likelihood estimator ji can be computed in closed form by solving the 
univariate cubic equation (|2.4p . We remark that the manipulations leading to the polynomial 
equations (|2.2p. and (12.41) forni a tri vial case of a computation of a lexicographic 

Grobner basis (jPachter and Sturmfeld . l2005l . p. 86). 



2.2 The discriminant 

A quadratic polynomial a2X^ + aix + oq in the indeterminate x may have no, one, or two 
(distinct) real roots. Which one of the three cases applies is determined by the sign of the 
discriminant of — 4aoa2- In the Behrens-Fisher problem we are led to the cubic polynomial 
/ in (|2.4p . A cubic always has at least one real root, and so we would like to know whether 
it has one, two, or three real roots. This can again be decided based on the sign of the 
discriminant, which for the cubic takes the form 

A = aia2 — 4aoa2 — 4a;^a3 + 18aoaia2a3 — 270903. 
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If A > then / has three distinct real roots. If A < then / has a unique real root and 
two complex ones. If A = 0, then / may have one real root of multiplicity three or two 
distinct real roots of which one has multi plicity two. Thes e and more general results on 



discriminants can be found for example in lBasu et al.l (|2003l . §4.1-2) 



The coefficients a^, oi and 02 of / in (j2.4p are random variables with a continuous 
distribution and is a constant. Consequently, A is also a continuous random variable 
such that the event {A = 0} occurs with probability zero. In other words the Behrens- 
Fisher likelihood equations almost surely have one or three real solutions. 

The discriminant A is a homogeneous polynomial of degree 6 in X, Y, ax and ay, and 
depends on X and Y only through their difference. However, with probability one, the sign 
of A depends only on r„ and the two ratios 7 = ax/^Y and 6 = {X — Y) / ay ■ This follows 
because A = ay • D with 



D = 



6^2 



26^ 



f (2 + 2r„-r2) + (2r3+2r; 



-5' 



7 



+ 8r„ - rl) + (8r^ + 8^ 



(2.5) 



4(1 + r„)(r„+ 7' 



2\3 



While A (and D) remain unchanged if n and m are replaced by n and m with r„ = n/m = 
n/fh = Vfi, such a change of sample sizes affects the sampling distribution of A (and D) 
and thus the probability of multiple solutions to the likelihood equations. We remark that 
instead of working with 6 one could work with the more symmetric quantities 



X -Y 



6 



1 



^2 

Y 



or 



X -Y 



a\ + rndy 



^/l"^ +rn 



However, such a substitution would lead to an increased degree in the analog of (|2.5p such 
that we keep working with 5 in the sequel. 

For any given value of r„, the polynomial D = Dr„ in the indeterminates 7 and 6 defines 
an algebraic curve. Figure [U shows two examples of these curves over the statistically 
relevant region with 7 > 0. By symmetry, the curve for r„ = 1 has four cusp points at 
(7,5) = (±1,±2); the cusps for r„ = 4 are at (7,(5) = (±i/27/2, ± ^25/2). In general, the 
four cusps have coordinates 



7 



(2r„ + l)^/{rn + 2)r„(2r„ +"l) 
(r„+2)2 



3{l + rn)^/3{rn + 2)rn 
(rn + 2)2 



(2.6) 



4 



5 
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Figure 1: Algebraic curve defined by the polynomial D^^ that is derived from the discrim- 
inant: (a) r„ = 1 and (b) r„ = 4. In each plot, points (7, 6) between the two curves 
correspond to a unique real root to the Behrens-Fisher likelihood equations. Points above 
and below the curves correspond to three distinct real roots. 



The curve has two asymptotes, namely, r„(5 = ±2\/l + r„ 7. 

The two respective branches of the curves in Figure [U enclose the region {D < 0}, which 
contains the (horizontal) 7-axis. Over this region the discriminant A is negative and the 
Behrens-Fisher likelihood equations have a unique real root. Clearly, neither the region 
{D < 0} nor the region {D > 0} need to be convex. When fixing 7 and increasing 5 then D 
will eventually remain positive because the leading term of D, when viewed as a univariate 
polynomial in 6, is with rl > 0. This means that bimodal likelihood functions arise 
when the difference between the means X and Y of the two samples is large compared to 
the empirical variances o"^ and ay- However, as can be seen in Figure [T]^b) with r„ = 4, 
there may exist values of 7 such that the values of 5 corresponding to unimodal likelihood 
functions do not form an interval. 



2.3 Finite-sample bound 

A finite-sample study of the probability of one versus three solutions to the Behrens-Fisher 
likelihood equations seems difficult. However, under the null hypothesis, we can give a very 
simple bound. 

Proposition 1. Let the random variable T have a t- distribution with m — 1 degrees of 
freedom. Let 7 = ax /cry- If the null hypothesis Hq is true, i.e., if fix = h'-Y, then the 
probability of three distinct real solutions to the Behrens-Fisher likelihood equations is smaller 

than 

-frh 3(l + r.K73K:T2)\ 

V (r„ + 2)2v/^^T7; ) 
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Proof. Three solutions occur if (7,(5) falls in the region {D > 0}. This region is strictly 
contained in the region of pairs (7,5) that have \5\ > c„ with 

_ 3{l + rn)y/3{rn + 2)rn 

{Vn + 2)2 

compare Figure [1] and (|2.6p . Hence, P{D > 0) is smaller than P{\5\ > c^). Under Hq, 




is distributed according to the t-distribution with m — 1 degrees of freedom. Expressing the 
event {\6\ > c„} in terms of this t-random variable yields the claim. □ 

Suppose, for example, that the samples are of equal size with the standard deviation ax 
being half the standard deviation try, i.e., = 1 and 7 = 1/2. Then, by Proposition [H 
the probability of three distinct real solutions to the Behrens-Fisher likelihood equations is 
smaller than 0.023 if n = m = 5, 0.00045 if n = m = 10 and 0.00001 if n = m = 15. Hence, 
despite its crude nature, the bound informs us that the probabilities are small. Monte Carlo 
simulations suggest that the three considered probabilities are in fact a factor 10 or more 
smaller than the stated bounds. 



3 Large-sample results 

We begin our study of the large-sample behaviour of the likelihood equations with the case 
when the discriminant converges almost surely to a non-zero limit. 

Proposition 2. Suppose min(n,m) 00 such that rn = n/m — > r E (0,oo). Let 5 = 
(Hx — IJ'y)/o'y and j = (Tx/o'y- Define Dr{'^,5) to he the quantity obtained from D in \2. 5|) 
hy replacing rn by r and (7,(5) by (7,(5). 

(i) If Dr{'y,S) < 0, then the probability that the Behrens-Fisher likelihood equations have 
exactly one real solution converges to one. 

(ii) If Dri'jjd) > 0, then the probability that the Behrens-Fisher likelihood equations have 
three distinct real solutions converges to one. 

Proof. The polynomial D is a continuous function of X, Y, cr\ and dry- Applying laws of 
large numbers to the four random variables, we find that Dr„{^, S) converges almost surely 
to Dr^jjS). In case (i), Dr{'y,6) is negative, and thus P{Dr„{'y,5) < 0) converges to one, 
which implies the claim. Case (ii) is analogous. □ 

The next result is a corollary to both Propositions [T] and [2j 
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Corollary 3. Suppose Hq is true, i.e., fix = IJ'Y = A^- //min(n, m) oo and r„ = n/m 

r G (0,00), then the probability that the Behrens-Fisher likelihood equations have exactly one 
real solution converges to one. 

Proof. If fix = fiY, then 5 = and the claim follows from Proposition [2] because Dr{'y,0) = 
—4(1 + r)(r + 7^)^ is negative. □ 

Proposition [2] does not apply to the situation when is zero. However, these critical 
cases can be studied using asymptotics similar to those encountered with likelihood ratio 
tests. The resulting asymptotic probabilities will depend on whether or not the point (7, 6) 
forms a singular point of the curve defined by the vanishing of D^. 

Definition 4. Let h be a polynomial in the ring of polynomials in the indeterminates xi 
and X2 with real coefficients. Let V{h) be the algebraic curve {x S | h{x) = 0}. ^ point 
X G y{h) is a singular point if the gradient Vh{x) is zero. 

Our curve of interest, V{Dr), has four singular points whose coordinates were given in 
(j2.6p : recall Figure [TJ All other points on V{Dr) are non-singular. 

We will show that the critical behaviour of the number of real roots to the Behrens- 
Fisher likelihood equations is determined by the local geometry of the curve V{Dr) at the 
true parameter values (7,(5). This geometry is captured in the tangent cone. 

Definition 5. The tangent cone o/y C at x G^ is the set of vectors that are limits of 
sequences an{xn — x), where an are positive reals and x„ G F converge to x. 

The tangent cone, which is a closed set, is indeed a cone. This means that if r is in the 
tangent cone then so is the half-ray {At | A > 0}. 

Theorem 6. Suppose that min(n,m) 00 and rn = r + o{\/ ^/n). Let 7 > 0. 

(i) If (7, 5) is a non-singular point of the curve V{Dr), then the probability of exactly one 
real solution as well as the probability of three distinct real solutions to the Behrens- 
Fisher likelihood equations converge to 1/2. 

(ii) If (7,(5) is one of two singular points of the curve V{Dr) that have 7 > 0, then the 
probability of exactly one real solution converges to one. 

Proof. We first show that the asymptotic probability can be obtained from a distance be- 
tween a normal random point and a tangent cone. Different types of tangent cones will then 
be shown to lead to results (i) and (ii). 

Let W{DrJ be the set of points (7, 5) £ (0, cx)) x M such that Dr„{^, S) < 0. Let 

Xn = n- _ min (7 - 7)^ -\- {6 - 5)'^ 

be the squared and scaled distance between the random point (7, 5) and W{Dr„). In Figure 
[T]the set W{Dr„) corresponds to the region between and including the two curve branches. 



7 



The Behrens-Fisher likelihood equations have three distinct real solutions if and only if 
A>„ (7, S) > if and only if An > 0. 

By the central limit theorem and the delta method, the two random variables An = 
V^il ~ 7) ^i^d = \/n{5 — 6) converge jointly to a centered bivariate normal distribution 
A'^2(0, S). In order to make use of this convergence, we rewrite 

An = min [An - \/n(7 - 7)] ^ + [Bn - Vn{S - 5)] ^. 
(7,5)eiy(A-„) 

The limits, for n — > 00, of convergent sequences of the form y/n[{^n,^n) — ili^)] with 
(7w, ^T7,) £ W(Dr„) form th e tangent cone T{'y,6) of the set W{Dr) at (7, (^). It thus follows 



from Ivan der VaartI (|l998l . Lemma 7.13) that as n tends to infinity, the random distance A^ 



converges in distribution to the distance 

Aoo = _min (Zi - 7)^ + {Z2 - df, 

between the normal random vector Z = {Zi, Z2) ~ N2{0, S) and T(7, 6). 

Case (i): If (7,(5) is a non-singular point of V{Dr)-, then T(7, (5) is a half-space H 
comprising all points on and to one side of a line through the origin. The normal vector of this 
line is given by the gradient VL'r(7, <5)- The probability P(Aoo > 0) = G T.-^/'^H) 

is equal to 1/2 because E~^/^Z ~ A''2(0, /) is standard normal and because is still a 

half-space with the origin on its boundary. Since P{Dr,Xli ^) > 0) = P(An > 0) converges 
to P{\oo > 0) we have established claim (i). 

Case (a): If (7,(5) is a singular point, then T{^,5) is all of M^; compare Figured! Thus 
-P(Aoo > 0) = 0, which implies claim (ii). □ 

For the curious reader, we remark that the tangent cone to the curve V{Dr) at its singular 
point (7,(5) with 7,(5 > is the half-ray of points (7,(5) with (5 > and 7y^3(2r -|- 1) = 
5{r — 1). If r = 1, then this half-ray is the non-negative (5-axis. The half-ray has positive 
slope if r > 1. The slope is negative if r < 1. 

We conclude by illustrating the results obtained in this section in Figure [51 which shows 
simulations on the probability of three distinct real roots to the Behrens-Fisher likelihood 
equations. This figure addresses the case 7 = r = 1 in which 5 = fix — jJ'Y- The simulations 
confirm Theorem[6|^ii) because the probability of three distinct real roots appears to converge 
to zero if (5 = 2 and (7, 6) = (1, 2) is a singularity of V{Di). 
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